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Abstract 



Motivated by the problem of packing Virtual Machines on physical servers in the cloud, we study 
the problem of one-dimensional online stochastic bin packing. Items with sizes sampled independent 
and identically (i.i.d.) from a distribution with integral support arrive as a stream and must be packed on 
arrival in bins of size B, also an integer. The size of an item is known when it arrives and the goal is to 
minimize the number of non-empty bins (or equivalently, waste, defined to be the total unused space in 
non-empty bins). 

Online stochastic bin packing has been extensively studied in theoretical computer science, com- 
binatorics, and probability literature, and there exist many heuristics. However all such heuristics are 
either optimal for only certain classes of item size distributions, or rely on learning the distribution. The 
state-of-the-art Sum of Squares heuristic (Csirik et al. |8|) obtains sublinear (in number of items seen) 
waste for distributions where the expected waste for the optimal offline algorithm is sublinear, but has a 
constant factor larger waste for distributions with linear waste under OPT. In f§|, the authors solved this 
problem by learning the distribution and solving an LP to inject phantom jobs in the arrival stream. 

As our first contribution, we present two distribution-agnostic bin packing heuristics that achieve 
additive 0(y/n) waste compared to OPT for all distributions. Our algorithms are essentially gradient 
descent on suitably defined Lagrangian relaxations of the bin packing Linear Program. The first algo- 
rithm is very similar to the SS algorithm, but conceptually packs the bins top-down instead of bottom-up. 
This motivates our second heuristic that uses a different Lagrangian relaxation to pack bins bottom-up. 
Our heuristics can also be interpreted as iterative Primal-Dual algorithms, and provide a unified view of 
Primal-Dual algorithms in stochastic processes, convex optimization and theoretical computer science 
communities. 

Next, we consider the more general problem of online stochastic bin packing with item departures 
where the time requirement of an item is only revealed when the item departs. Our algorithms extend 
as is to the case of item departures, and we demonstrate their excellent performance experimentally. We 
also briefly revisit the Best Fit heuristic which has not been studied in the scenario of item departures 
yet. 



'University of Chicago. This research was done while the author was a postdoctoral researcher at Google. 
^ Google, Inc. 
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1 Introduction 



Bin packing is one of the oldest resource allocation problems and has received considerable attention due to 
its practical relevance. In the classical version, a static list of item sizes needs to be partitioned into fewest 
partitions offline, each partition summing to at most B (the bin size), and is NP-hard. In the still harder 
online version, the list of item sizes is revealed one at a time, and the items must be irrevocably assigned 
to a bin on arrival. A common approach to get around the NP-hardness obstacle of bin-packing is to make 
some assumptions on the problem instance. In online stochastic bin packing, we assume that the number of 
items to be packed is much larger than the number of items that can fit in a bin, and the item sizes form an 
i.i.d. sequence from some distribution F. 

There has been an extensive study of heuristics for stochastic online bin packing. One line of research has 
focused on performance of common heuristics (e.g., Best Fit (BF)), and identifying item size distributions 
for which these can be optimal, or provably suboptimal. On the algorithmic front, heuristics have been 
proposed which are asymptotically optimal as the number of items grows. However, to the best of our 
knowledge, all known heuristics explicitly learn the item size distribution and at some level involve solving 
a linear program (LP) to tune the heuristic. In this paper we present a simple algorithm that is distribution- 
agnostic and asymptotically optimal. Our algorithm is motivated by the Lagrangian relaxation of the bin 
packing LP, but surprisingly, has not appeared in the literature. 

Our original motivation for reopening the well-studied bin packing problem was scheduling virtual machines 
(VMs) in public and private clouds. Typically the bottleneck resource is the memory, and requests for 
virtual machines specify the amount of memory needed. For QoS guarantees, when a VM is scheduled on 
a physical server, the memory reserved for it can not be used by another VM on the same physical server. 
This problem of memory-constrained VM allocation perfectly fits the framework of one-dimensional online 
stochastic bin packing. But there are also two crucial differences. Unlike items in classical bin packing 
which are permanently assigned to their bins, VMs have a finite execution time (possibly unknown at the 
time of decision making), and thus eventually depart. Thus a bad packing decision can be undone eventually. 
Second, the possibility of migrating VMs gives an additional flexibility to obtain a tighter packing. We 
extend our bin-packing algorithm to the setting of item departures. In addition to being agnostic to the item 
size distribution, our algorithm is also agnostic to the residence times of the items (i.e., execution times of 
VMs). Our algorithm also exhibits good transient properties when the distribution of item sizes changes, a 
very desirable feature in practical settings where the VM statistics exhibit time of day effects. 

1.1 Model Notation and Definitions 

There is a sequence of items that are packed online using algorithm A. Items can be of different types 
j G {1, . . . , J}. The size of type j item is Sj, and the probability that an item is of type j is pj. Thus 
F = (U F ,P F ), U F = {si,...,sj} and P F = {pi,...,pj} (J2j=iPj = !)> specifies the item size 
distribution. Without loss of generality, we assume that items are enumerated in the increasing order of their 
sizes, i.e., s\ < S2 < ■ ■ ■ < sj. Upon arrival, items are packed into bins of size B, sj < B < oo. The 
pair (F, B) is called the bin packing instance. In this paper we assume that the item sizes {sj} and the bin 
size B are integers. We say that a bin has level h if its total content sums to h. We assume that there is an 
unlimited number of bins and we use N^{n), h = 1, . . . , B — 1, to denote the number of bins at level h 
after nth item has been packed. 

The performance metric that we want to minimize is the waste incurred by a packing algorithm on distribu- 
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tion F, i.e., 

B-l 

W^(n)^J2 N h( n )(B-h), (1) 

h=l 

where Wp{n) represents the total accumulated empty space in partially rilled bins after n items from dis- 
tribution F have been packed by algorithm A. We use Wp PT (n) to denote the waste of an offline optimal 
algorithm which is given a list of n i.i.d. samples from F. A result of Courcoubetis and Weber [7] proves 
that any discrete distribution falls in one of three categories based on E \W F PT (n)] : 

1. Linear Waste (LW) : E[Wj? PT (n)] = 8(rt), e.g., B = 9,U F = {2,3}, P F = {0.8,0.2} 

2. Perfectly Packable (PP) : E[Wg PT {n)] = Q(y/n), e.g. B = 9, U F = {2, 3}, P F = {0.75, 0.25} 

3. PP with Bounded Waste (BW) : E[Wg PT (n)] = 6(1), e.g. B = 9,U F = {2, 3}, P F = {0.5, 0.5} 

In Section [3] we extend our model to the case where items depart after spending some random time in 
the system. For simplicity, we assume that items arrive according to a Poisson process with rate < 
A < oo and leave after i.i.d. exponentially distributed random times with finite mean 1 / [ij and M F = 
• • • , 1 < j < J. In this case F = (U F , P F , M F ) describes the item size distribution. We 

will be interested both in the steady-state behavior (waste is parametrized by A in this case), and in the 
transient behavior (convergence rate to steady-state). 

1.2 Review of Bin Packing Literature 
Packing with permanent items 

When the bin size is 1, and item size distribution is uniform between and 1, Shor lfl6l showed that the 
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expected waste under First Fit (pack in the oldest feasible bin) grows as 0(n3 ). For Best Fit (pack in the 

fullest feasible bin), Leighton and Shor [ 12] showed this to be ©(na log 4 (n)). Finally, Shor [ 15] proposed 

i i 

a scheme that achieves the lower bound of 0(raa logs n). For discrete item sizes, when the item sizes are 
uniformly distributed over { , %, . . . , -g } , Coffman et al. showed the expected waste for J = B or 

J = B — I grows as Q(nB 2 ) for First Fit, and 0(na log B) for Best Fit. For J = B — 2, bounded expected 
waste for Best Fit was showed by Kenyon et al. iTTTTl . and for First Fit (using Random Fit as an intermediate 
step) by Albers and Mitzenmacher [1J. Kenyon and Mitzenmacher |[T0l proved that the waste under Best 
Fit is linear when J = aB, and ^ < a < j^, B large enough but is conjectured to hold for all < a < 1. 

Sum of Squares (SS) rule OH): The SS heuristic is in some sense the state-of-the-art bin packing policy 
when item sizes {sj} and bin size B are integral. It is almost distribution-agnostic, and nearly universally 
optimal for all distributions F, and works as follows: Let N^ s (n) denote the number of bins of level h after 
n items have been packed. The (n + l)st item is packed in a feasible bin so as to minimize the potential 
function Yli<h<B ^h S ( n + I) 2 - ^ n [8], it is proved that for PP distributions, the waste is indeed 0(y / n). 
Further, for BW distributions, the waste of SS is O(logn) which can be reduced to 0(1) by learning the 
support of the distribution. However, for linear waste distributions SS achieves a constant factor more waste 
than OPT. The authors thus propose to tune the policy by introducing 'phantom' items of size 1 at the correct 
rate (the smallest rate so that the distribution becomes perfectly packable) where this rate is determined by 
learning the distribution F and solving an LP. From the characterization of bin packing instances of Cour- 
coubetis and Weber [7] it follows that the distribution-aware SS policy achieves the optimal order of waste. 
Our proposed heuristics are unable to obtain better than Q{^/n) waste for bounded waste distributions due to 
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our emphasis on 'blind' policies. Given the information that the distribution is bounded waste, our heuristics 
can be adapted to yield smaller waste but this is beyond the scope of the present paper. 

The problem of minimizing scrap for remnant scheduling is a rephrasing of the bin packing problem for 
which Adelman and Nemhauser propose an algorithm that learns the item size distribution, and uses the 
duals of the packing LP to online packing. Rhee and Talagrand [ 14] also propose a packing heuristic which 
uses all the item sizes seen so far to form a bin packing LP relaxation. 



Packing with item departures 

While bin packing has not been studied when items arrive and depart, the model of storage allocation is 
perhaps the closest. There is a semi-infinite tape of memory. Requests for blocks of memory arrive online, 
and each request must be packed in a contiguous free chunk. The memory requests subsequently free up over 
time, causing fragmentation of the free space. The performance metric considered is analogous to waste: 
the difference between the largest index of currently occupied memory bit and the number of currently 
used memory bits. Coffman and Leighton (6J analyze this model under Poisson arrivals and departure and 
i.i.d. memory request size assumptions and propose a distribution-aware scheme that achieves Q(y/n) waste 
where n denotes the expected volume of memory requests in the system in steady-state. 
The SS heuristic can be extended for perfectly packable distributions even with item departures and we 
expect it to perform well, but it is not clear how the SS algorithm would change for linear waste distribution 
(the departure dynamics of the 'phantom' size 1 items in particular). 



1.3 Summary of Contributions and Outline 

A new bin-packing heuristic for permanent items: In Section[2]we present our bin packing heuristics for 
the case where item sizes and the bin size are integral, and items never leave the system. Our first heuristic 
(PD-quad) is remarkably similar to SS but with a couple of twists: the nth arrival is packed so as to minimize 
the Lagrangian: 

£ quad (N(n),n) = £(B - h)N h (n) + ^ Y^N h {nf (2) 

h=l h=l 

Further, an arriving item can start a new bin from the middle. 

Our second heuristic (PD-exp) packs so as to minimize the Lagrangian: 

B-i K B-l 

£ex P (N(n),n) = £(B - h)N h {n) + — £ e ~< n > N ^ (3) 

We prove that for the appropriate choice of e(n) = 0(-^=), both heuristics achieve Wp D (n) = Wp PT (n) + 
0(y/n) for all discrete distributions F. 

Primal-Dual bin packing heuristic for item departures: In Section [3~Tj we demonstrate via experiments 
that the Best Fit heuristic continues to incur linear waste for perfectly packable distributions in the setting of 
item departures, and given the optimal packing as the initial configuration. In Section |3T2j we experimentally 
demonstrate that while both the proposed heuristics have sublinear suboptimality in steady-state (unlike SS), 
PD-exp has much faster convergence rate to steady state than PD-quad. 
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2 Bin Packing without Departures 



In this section we propose two schemes for stochastic online bin packing without item departures that 
achieve additive 0(y/n) suboptimality irrespective of the item size distribution (Uf,Pf)- As mentioned 
before, there is no simple, distribution-agnostic bin packing heuristic known that performs optimally in 
the case of linear waste distributions. In this setting it was shown that the Sum-of-Squares (SS) heuristic 
achieves constant factor larger waste than the optimal. All attempts to improve this performance so far have 
relied on the knowledge of distribution Pp (0, 0) or its learning in the process of packing (O). Our 
algorithm fills this gap. We motivate the first algorithm by understanding why SS is suboptimal for Linear 
Waste distributions, and introducing 'tweaks' to fix the shortcomings. Later we will re-derive the algorithm 
as a stochastic gradient descent of a suitably defined Lagrangian. Indeed, while our first algorithm resem- 
bles the SS algorithm, this resemblance is purely coincidental. Our proposed algorithm is fundamentally 
different from Sum-of-Squares and the interpretation we offer in Section 2.1 was obtained in hindsight. 



2.1 Modifying the SS algorithm for LW distributions 

On the arrival of nth item of size s, SS sends the item to a bin of level h* where: 

h* = argmin [N h+S (n - 1) - N h (n - 1)] 

h:N h (n-l)>0 

with the convention No = Nb = 0. Therefore, SS tries to equalize the number of bins of each level. 
This heuristic works for perfectly packable distributions where the items can be packed without any Nh 
growing large. However, for LW distributions some Nh must grow as 8(n), and SS heuristic 'pulls along' 
the number of bins which have room for more items. E.g., for B = 5 and only size 2 items, SS creates 8(n) 
bins of level 4 as well as 2. 

There seems to be an easy fix to this problem: Since we want to control our heuristic using a sum-of-square 
potential term, we do not want any Nh to grow as 0(n). Therefore, we allow bins to be started at any level 
h. That is: 

h* = argmin [N h+S (n - 1) - N h (n - 1)] 
h 

and if Nh* = 0, we open a new bin, put item at level h* , and set the level of the bin to h* + s. Therefore, 
we create a forbidden hole of size h* at the bottom. 

It is easy to see that this change alone does not work. Nothing now prevents us from opening a new bin for 
each item and sending them to level B — s. We fix this by penalizing items which cause the level of a bin to 
reach B. Therefore, on the arrival of nth item of size s, we send it to level h* where 

h* = argmin [C ■ l h =B-s + N h+S (n - 1) - N h {n - 1)] 
h 

where C is the penalty for 'closing a bin.' We can equivalently view this heuristic as minimizing the cost 
function: 

1 B 

L(N) = C ■ (# level B bins) + - N% 

h=l 

which is a combination of the Best Fit and SS cost functions. In the next section we show that this heuristic 
is a special case of Lagrangian gradient descent with Nh representing the dual variables, and present the 
optimality result. 
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2.2 Bin-packing LP and Primal-Dual algorithms 

As discussed in Section 5 of (8), given the item size distribution F, the optimal waste can be computed by 
solving the following LP (called the Waste LP for Pp in lIBl): 

w(f) = mm y( B - h ) ( E h - s ^ - E u & j w 

s.t. 

E^'^^E^'^-O. fc€{l,...,B-l} (5) 

E"C7>k) =Pii ie{i,...,J} (6) 

h 

v(j,h) = 0, j e {1, . . . , J}, Sj > B - h (7) 

0, je{l,...,J}, fe€{0,...,B-l}. (8) 

Variables v(j,h), 1 < j < J, < /i < S — 1, are used to denote the rate at which items of type j 
are sent to bins of level h. Inequalities ([5]> are equivalent to saying that the destruction rate of some level 
cannot exceed its creation rate, and the slack (if any) denotes the rate at which bins of level h accumulate 
in the system. The value of the objective function Wf denotes the waste per item. Therefore, for perfectly 
packable distributions Wf = 0, and Wf > for linear waste distributions. 

Our approach is quite straightforward in hindsight: rather than learn the distribution Pp and then solve 
the waste LP, our packing heuristic mimics stochastic Primal-Dual gradient descent solution to the waste 
LP (the stochasticity coming from the random arrival of items). We first present a general template that 
unifies the seemingly different notions of primal-dual heuristics in algorithms, stochastic processes, and 
optimization under the umbrella of Lagrangian optimization. 

2.2.1 A general template for Primal-Dual algorithms 

Consider the following convex problem: 

minimize f(x) 
subject to g(x) < 

The approach of Lagrangian optimization is to convert the constrained optimization problem into an un- 
constrained optimization by imposing a strictly convex increasing penalty function on the constraint and 
moving this penalty into the objective function: 

minimize C(x) = f(x) + $>(g(x)) 

Further, given the optimal solution x* to the above optimization problem, an approximation to the value of 
the dual \ g for the constraint g(x) < can be obtained by comparing the two forms 



dg 



and [V/ + X g • Vg] xt = 



as A ~ °m 



A <? ~ dg 



g(x*) 

Now depending on our choice of penalty function $(•), there is a full menu of Lagrangian relaxations, each 
with its own mapping of primal to dual variables: 
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• Quadratic penalty: £quad(z) = f{%) + Ye (.9{x) + f Dual: X g = 

For the dual variables to drive the algorithm, we must violate the primal constraints. That is, we always 
have a primal infeasible solution. Primal-dual heuristics with quadratic penalty are common in control- 
ling queueing systems (e.g., lfl~8l [P71 ) because queues are nothing but temporary violations of capacity 
constraints and map to the corresponding duals. We will see later that the algorithm of Section 2.1 is 
indeed solving the Lagrangian relaxation of the waste LP with quadratic penalty function. 

g(x) 9( x ) 

• Exponential penalty: £exp(^) = f(x) + e ■ Dual: X g = 

Under exponential penalty, the dual variables are non-zero even when the primal solution is feasible, and 
exponential duals are very popular for worst-case (non-stochastic) online packing and covering problems 
(e.g., El [Illlll. The second heuristic we propose corresponds to Lagrangian relaxation with exponential 
penalty. 

• log-barrier: AogO) = f(x) - e ■ log(-g(x)) Dual: X g = 

The solution is constrained to be always primal feasible, and therefore they are often used in interior 
point algorithms for convex optimization [3]. This would indeed give us a third Primal-Dual bin packing 
variant, but currently we can not prove a better than 0(y/n log n) additive suboptimality using log-barrier. 

In each case, as e — > 0, the penalty function approaches the barrier penalty, and e controls the violation of 
constraints (for quadratic penalty), or the loss in objective function (for exponential and log-barrier). 

2.3 Algorithm PD-quad 

The algorithm PD-quad described below is obtained by imposing a quadratic penalty on the constraint ([5]) 
of Waste LP ({N^t)} representing the slack, and thus scaled duals, for the constraints at time t). Due to 
lack of space, we omit a formal derivation but provide intuition later in the section. 



PD-quad : On t th arrival, say of type j: 

• Choose level h* to minimize the Lagrangian after packing the tth item: 

B-l 



h* = argmin A£ qua( j(t) = argmin 

h ' 0<h<B-s, 



B-i {h=B _ Sjy - Sj + e -^Y, N ^ 

h=l 



- if a bin of level h* exists (Nh* > 0): send the item to such a bin; 

- otherwise: open a new bin and set its level to h* + Sj] 

Update: 

- N h . +aj (t) = N h . +a .(t-l) + l; 

- N h .(t) = [JV h .(t-l)-l]+; 



We first analyze the case where we know the total number of arrivals n and e(t) is fixed, and then the more 
general case when n is not known (open-ended algorithm) and e(t) varies with t. 



Theorem 1 For the PD-quad algorithm with e(t) — u 



■In ' 



E 



W P F D - quad {n) 



<E[W$ PT (n)] +V2B^i 



1 



Theorem 2 For the PD-quad algorithm with e(t) — L ' 



E 



w PD- q uad {n) 



<E[W$ PT (n)]+V4B< 



n 



Proofs for both theorems appear in Appendix A. 1 

Intuition behind PD-quad : Recall that to be able to obtain non-zero duals using quadratic penalty in 
Lagrangian relaxation, the primal constraints must be violated. However, constraint (|5) of Waste LP said 
that we can not have more items starting at a level than ending at a level. Thus a violation of these constraints 
implies that we always have more items above level h than below h - we are packing our bins top-down. 
In fact, Nh(t) in the description of the above algorithm is the slack for constraint for level B — h in the 
waste LP (although we do not present the algorithm this way)! This justifies why we can start bins from the 
middle: in the waste LP bins need not end at level B. Also, penalizing an item that causes the level of a bin 
to reach B under PD-quad upside-down view of bin packing is the same as penalizing an item that starts a 
new bin by going to level 0. 

Further discussion : By starting bins at non-zero levels, we introduce wasted space that can not be used by 
future arrivals. This is similar to the mechanism of injecting phantom size 1 items for SS, however, PD-quad 
accomplishes this while being distribution-agnostic. Despite the good performance proved in Theorems [T] 
and|2| this is undesirable when we consider item departures, especially when distributions change over time. 
Holes introduced by linear waste distributions can not be reclaimed when the distribution becomes perfectly 
packable until the bin empties, and this can take time polynomial in the arrival rate. This shortcoming 
initially motivated the PD-exp algorithm we propose next, where bins are packed bottom-up, and the only 
empty space in bins are on the top which are always available for future items. 



2.4 Algorithm PD-exp 

The algorithm PD-exp is a straightforward Lagrangian minimization of the waste LP with exponential 
penalty function: 



PD-exp : On t th arrival, say of type j: 

• Choose level h* to minimize the Lagrangian after packing the tth item: 



h* = argmin A£ eX p(t) = argmin 

h ' N h (t-1)>0 



B • 1 



B 



B-l 



{h=0} 



v ; h=l 



-<t)N h (t) 



Update: 



- N h . +aj (t) = N h * +a .(t-l) + l; 

- N h .(t) = N h .(t-l)-l; 



Theorem 3 For the PD-exp algorithm with e(t) 



and n > B, 



E 



W™- exp {n)] <E[W$ PT (n)] + 



n 
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Theorem 4 For the PD-exp algorithm with e(t) 



2{B+t)> 



E 



W™- exp (n)} < E[W$ PT (n)] + y/8B*(n + B) 



Proofs for both theorems appear in Appendix A.2 



In the PD-quad algorithm, Nh is the slack for level h constraint in the waste LP and is always non-negative 
(the value of corresponding duals is obtained as e~ eNh ). Therefore, we always have a primal feasible 
packing. Intuitively, instead of a penalty for creating a bin of level h when Nh is large, the exponential duals 
give a positive utility for creating bins of any level h but the utility is decreasing in N^. If we normalize the 
bin size to 1 , the dependence of the suboptimality of our proposed algorithms on the discretization is y/~B 
which matches that of the Sum-of-Squares algorithm ([8, Theorem 2.5]). 

From the proofs of our results, it is also easy to obtain the following corollary (similar to the one proved in 
ll8l ) for the restricted adversarial setting where an adversary may choose a different distribution F t at each 
time step. 

Corollary 1 Let the item at each time step t be generated from a distribution Ft chosen by an adversary, 
possibly after observing the packing decisions in the first (t — 1) time steps. If the waste of Ft for bin size B 
is wt then the proposed heuristics achieve waste 



t=i 



E 

where the 0{^/ri) terms are as given in Theorems^Fj^for the corresponding PD variant. 

3 Bin Packing with item departures 

In this section we discuss stochastic online bin packing when items arrive and depart over time. As described 
in Subsection |1.1| we assume that items arrive according to a Poisson process with rate A, their sizes are 
given by Up, they appear with probabilities Pp, independently from arrival points, and their sojourn time 
in the system are exponentially distributed with means given by Mp. We will be concerned with the waste 
in steady-state as a function of arrival rate E[V4 /F (A)], as well as the rate at which the waste reaches the 
steady-state value given an arbitrary initial configuration. 

We begin by revisiting the popular Best Fit (BF) heuristic in Section |3~Tj While it is known that Best Fit 
can have linear waste for perfectly packable distributions when items do not depart, it is not known if BF re- 
mains suboptimal in steady-sate when items arrive and depart. It it natural to expect that at least for perfectly 
packable distributions, once BF reaches the optimal configuration, it would stay there. We demonstrate via 
experiments that while for certain bin packing instances the performance of Best Fit does improve when 
items depart, in the general case Best Fit can have linear waste for perfectly packable distributions. Further- 
more, depending on the distribution F, the evolution of system state can become slower while remaining 
deterministic as the arrival rate A increases, and it can take up to Q{\) time for the 6(A) quantities to settle 
to their steady state values. 



Finally, in Section 3.2 we present some experimental and analytical results on the performance of PD-quad 
and PD-exp heuristics under item departures which show that PD-exp has superior performance in steady- 
state while possessing fast rate of convergence. A full analytical treatment will appear in a future work. 
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3.1 Best Fit under item departures 



As mentioned before, Kenyon and Mitzenmacher [ 10] have proved that for stochastic bin packing without 
departures, there exist perfectly packable distributions for which Best Fit heuristic has linear waste. In fact, 
the distributions are very simple, Up = {l/B, 2/B, . . . , J/B} and Pp = {1/J, 1/J, . . . , 1/J}. Despite 
this, BF continues to be used in practice due to its simplicity and the fact that it does not utilize any prior 
information about the item size distribution. 




Figure 1: Simulation results showing the fluid scale convergence to the optimal configuration under BF 
heuristic. The y-axis shows the number of bins of different configurations scaled by the arrival rate A. The 
model parameters are B = 6, Up = {2, 3}, Pp = {3, ji- 
lt is not obvious if the suboptimality of BF continues to hold when items also depart from bins. A subop- 
timal packing decision will be undone eventually. In fact, the present work began as an exploration of the 
optimality of BF with item departures. For example, consider the following bin packing instance: Bin size 
is B = 6, item sizes are Up = {2, 3} and Pp = {1/2, 1/2}. When items do not depart, it is straightfor- 
ward to prove that this distribution has bounded waste under OPT, but BF incurs linear waste. But for the 
same distribution under departures with Mp = {1, 1}, Wp F (X) = o(A) in steady-state. In fact, starting 
from any configuration, BF reaches a configuration with sublinear waste in 0(log A) time under departures 
since the unique fluid stable state for this distribution under BF is when the number of bins which are not 
in configuration 222 or 33 are o(A). Figure [I] shows simulation results for the evolution of fluid scaled state 
for three values of the arrival rate, starting with a configuration with bins in only 32 configuration. The fluid 
convergence to the optimal state is quite evident. In fact, we can also prove that BF will be asymptotically 
optimal when for some integer k the item sizes as well as the bin size are integer powers of k. 

An exploration of transient characteristics of BF: In Figure[2] we show simulation results for the distri- 
bution Up = {2, 5}, Pp = {i, Mp = {1, 1} and B = 10. This distribution is again perfectly packable, 
but unlike the previous example, it does not have a unique fluid stable state. Any state where the only bin 
configurations with @(\) bins are 55, 22222, 522 is 'fluid stable.' The top row of figures shows the fluid 
scale evolution of the system state, and it appears that the system has attained steady state at around t = 8. 
However, if we look at the bottom plot, which is for a longer run with time on logarithmic scale, we see 
that after t = 8, the fluid scaled quantities keep evolving. But even at steady state, we have Q(\) bins in 
configuration 522, thus causing linear waste. We also observe an even more fascinating fact: the steady state 
is attained at approximately t = 100 for A = 2 x 10 4 , and at t = 1000 for A = 2 x 10 6 . Thus the fluid scale 
evolution is followed by another phase where the fluid scaled quantities evolve at a rate of 0(-4=). 
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Figure 2: Simulation results showing the diffusion scale convergence to steady-state under BF heuristic 
starting from empty. The model parameters are B = 10, Up = {2, 5}, Pp = {3,5}- The top row illustrates 
the initial fluid scale evolution, and the bottom row shows both the fluid and the ensuing 'diffusion' scale 
evolution. 



However, there remain simple PP distributions for which Best Fit has linear waste even when we start from 
the optimal configuration. Figure [3] shows an example with Up = {3, 7} with B = 21. This distribution 
has sublinear waste for any values of Pp and Mp since we can pack items using configurations 777 and 
3333333. In Figure [3} we show the evolution of the system in terms of the number of bins in three selected 
configurations for two values of arrival rate. We make two observations: 

1. Even though we start with a perfect packing, a constant fraction of items start appearing in bins with 
configuration 7733 which is non-perfect. Therefore, BF accumulates linear waste. 

2. The evolution of the system state (scaled by A) converges to deterministic sample paths as A increases, 
but evolves on a 6(A) time scale. Essentially, this means that as we increase the arrival rate, in 6(A) 
events (arrivals/departures) the expected change in total number of bins of different configurations re- 
mains bounded : 6(1). In fact, simulations for different starting configurations show that for this distri- 
bution under BF, the evolution of system state can evolve deterministically at upto three time scales in 
succession: first on 6(1) time (fluid scale), then on a slower 6(vA) scale, and finally on a still slower 
6(A). In the experience of authors' this is the first 'natural' stochastic process we are aware of which 
demonstrates such phase transitions. 
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Figure 3: Simulation results showing the state evolution under Best Fit. The packing instance is B = 21, Uf 
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{3,7} ) P i r = {i,i},M F = {l,l}. 



3.2 PD-quad and PD-exp under item departures 

Our proposed heuristics extend almost as-is to the case of item departures. If an item of size s departs from 
a level h bin, we set the level of the bin to h — s and update Nh and Nh- S accordingly. For the case of 
PD-quad, if a departure causes a bin to become empty, we also remove the hole, if any, at the bottom of the 
bin. The parameter e(t) in the Lagrangian is chosen as per the statements of Theorems [2] and [4] but with t 
replaced by the number of items present in the system. 

As alluded to earlier, while both our proposed heuristics demonstrate desirable steady-state properties, PD- 
quad suffers from the drawback that if the initial configuration has bins with holes at the bottom, then we 
can not claim that space until the bin empties. For example, suppose B = 5, and we start with a distribution 
with only items of size 2, and then switch to only items of size 1. Since the first distribution has linear waste, 
almost all bins will have a hole of size 1. After the distribution changes to only size 1 items, even though it 
is perfectly packable, PD-quad will pack four size 1 items in a bin as the size 2 items depart, and it takes a 
long time to recover from this configuration. The next theorem makes this formal. 

Theorem 5 Consider a bin packing instance with B = 5 and only size 1 items with mean sojourn time of 
items l.Ifwe start from an initial configuration with h bins with four items per bin and a hole of size 1 at 
the bottom, then under PD-quad heuristic, it takes Q(-^/A/ log A) time for the waste to become o(A). 

Proof in Appendix |A.3 

Figure [4] shows a simulation run comparing the transient properties of SS, PD-quad and PD-exp. In the time 
interval [0, 10] items arrive from a perfectly packable distribution. At time t = 10, we switch to a linear 
waste distribution, and then at time t = 20 we switch to a different perfectly packable distribution. We 
see that SS still shows a constant factor larger waste for the linear waste distribution, but quickly reaches 
sublinear waste when we switch to a perfectly packable distribution. PD-quad shows optimal waste for 
linear waste distribution, but takes a long time to converge to steady-state after we switch to a perfectly 
packable distribution, and this time increases with A (in fact, steady-state has not been attained by t = 40). 
PD-exp demonstrates both, optimal steady-state waste, as well as quick rate of convergence to steady-state 
after the item size distribution changes (in time 0(log A)). 

A rigorous proof of optimality of the proposed heuristics under departures will appear in a forthcoming 
work. 
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(a) A = 2.5 x 10 4 (b) A = 1 x 10 5 

Figure 4: Simulation results comparing steady-state and transient waste for Sum-of-Squares (SS), and our proposed 
Primal-Dual heuristics. The item size distributions change from perfectly packable to linear waste at t = 10, and then 
back to a different perfectly packable distribution at time t = 20. 

4 Summary and Open Questions 

Our driving goal in this paper was to design simple and optimal-waste heuristics for stochastic online pack- 
ing of items into bins when items arrive and depart over time, with special emphasis on good transient 
properties. We began by revisiting the case of bin packing when items do not depart once packed, and 
proposed two simple distribution-agnostic algorithms motivated by stochastic gradient descent solution of 
Lagrangian relaxations of the bin packing linear program. We show that irrespective of the item-size distri- 
bution, our heuristics achieve sublinear additive suboptimality compared to the offline optimal, improving 
over the state of art Sum-of-Squares heuristic. Our heuristics extend as-is to the case of item departures. 
We also reveal interesting dynamics and some intricate phase transitions incurred by running Best Fit heuris- 
tic in case where jobs depart from the system. Our simulations show that departures can improve the per- 
formance of the BF, but not always. Better understanding of BF under departures is a subject of ongoing 
research. 
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A Proofs 



A.l Optimality Analysis of the Primal-Dual Algorithm with Quadratic Penalty Function 
Proof of Theorem [I] 

Recall that we cast our algorithm as the minimization of the following Lagrangian: 

1 

£quad(*) = B ■ N B (t) - (total volume of jobs seen) +e • - N h (t) 2 , 

F A (t) vJ!zi ✓ 



v A (t) 

where the second term represents the quadratic relaxation of the penalty paid for violating constraint ([5]>, and 
Nh(t), 1 < h < B, denote the number of bins at level h after packing t items. For any packing algorithm 
A, we call the first term, Fa, the primal objective function value, and the second term, Va, the potential 
function value. 

We first note that we can bound the waste of algorithm A at time t as 

B-l 

W A (t)<F A (t) + BY,N h (t). (9) 
h=l 

Next, we prove that Nh{t) < ( B+ ^' h ^ \ < h < B — 1: Consider the first time when this constraint is 
violated, and let h' be the violating level. Therefore, there exists some item j that we send to level b! — Sj 
where e(N h ,(t - 1) - N h ,- 8j (t - 1)) < B - eN B - Sj < B. But, then N h ,{t - 1) < f + N h ,_ S] (t - 1) < 

B+(B+l)(h'- Sj ) < (B+l)h' _ L Therefore) Nhf ^ < ( (B+l)h> y This bound Qn N ^ impUes that the waste 

due to partially filled bins is at most ^ h=1 (- B + 1 ) /t ( B h ) < si 
Next, we will show that 

E[F PD (n)] < E[F OPT (n)] + en (10) 
Choosing e = would then establish additive ^/n waste of the PD algorithm. 



Since the PD algorithm minimizes the increase in the Lagrangian at each time step, for any packing algo- 
rithm A: 

AF PD (t) + eAV PD (t) < AF A (t) + eAV A {t) 

where A changes are evaluated with respect to the state at time t and A is some algorithm which we use as 
the representative of the optimal algorithm. Taking a telescoping sum of the above inequality gives us: 

n 

F PD (n) < ^(AF A (t) + eAV A (t)) - e[V PD (n) - V PD (0)} 
t=i 

n 

<J2(^F A (t) + eAV A (t)) 



t=i 



since V is non-negative and Vpd{0) = 0. In order to relate Fpp>(n) to the optimal waste of the waste LP ([4]), 
Wf, our goal is to find some policy A which can be distribution-aware, but for which AF^t) + eAV^(t) is 
small for any state {N\(t) : . . . , Np-i(t)}. To achieve this, we modify the policy Ap defined in (H to prove 
the performance of SS for Perfectly packable distributions. 
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Definition 1 (Ap policy (Csirik et al. (8))) Consider an optimal packing P*. Map the arriving item to 
an item of the same size in P* uniformly at random (say in bin Y). Given bin Y in the packing P* and 
the current packing P, we first find an ordering J/i, 2/2 3 • • • > Z/IKI of the items in Y, and a threshold index 
last(Y), < last[Y) < \Y\, such that if we set Si = X^}=i s (Vj)> then: 

• P has partially filled bins with each level Si, < i < last(Y) 

• P has no partially filled bins with level S[ ast (Y) + s(yi)for any i > last(Y) 

Then, if the arriving item is mapped to an item with index j in the ordering of items in bin Y and j > 
lastiY), it is sent to a partially full bin at level Si ast {y) (or if last(Y) = 0). Otherwise, if 1 < j < 
last(Y), we place the item into a bin of level Si—i. Finally, if j = 1, we create a new bin with level s(yi). 

We now define our modification: 

Definition 2 (Modified A p policy for PD-quad) Consider an optimal packing P*, but increase the level of 
each bin in P* to B by possibly inserting a hole at the bottom of the bin. Given a bin Y in the optimal packing 
P* with a hole of size s(— 1) (s(— 1) = Ofor perfectly packed Y) and the current arbitrary packing P, we 
first find an ordering y±, y2, ■ . . , y\y\ of the items in Y, and a threshold index last(Y), < last(Y) < \Y\, 
such that if we set Si = s(— 1) + Ylj=i s (llj) (&o = ®)> then: 

• P has partially filled bins with each level Si, < i < last(Y) 

• P has no partially filled bins with level Si ast (y) + s(yi)for any i > last(Y) 

Then, if the arriving item has index j > 1 in the ordering of items in bin Y and j > last(Y) > 0, it is sent 
to a partially full bin at level Si ast ry)- If last(Y) = or j = 1, the item is placed in a new empty bin at 
level s(— 1) to create a bin of level s(— 1) + s(yj). Otherwise, ifl<j< last(Y), we place the item into a 
bin of level Sj—i. 

Next, we show the following result: 

Proposition 1 Given an arbitrary packing P, the probability that A p packs an item to close a bin ( create a 
level B bin) is at most the probability that a random item in the optimal packing P* is placed on the top of 
its bin. 

Proof: For every bin Y of P* , Ap first creates an ordering of items. Note that conditioned on mapping the 
arriving item to bin Y in P*, each item of Y is chosen with equal probability. Now, if Ap sees an item of 
size s and it maps it to bin Y, then it can only create a level B bin if it was mapped to the topmost item in 
the ordering. However, there is a chance that it does not create a level B bin because no bin of level B — s 
exists. 

In view of the previous proposition, we have 

AF PD {t) + eAV PD (t) < AF Af + eAV Ap , 
which, after taking expectations, yields 

E[AF PD (t)} - E[AF AF (t)} < eE[AV Ap } - eE[AV PD }. 
Then, since E[AV^ F ] < 1 as proved in []8] (this property also holds for the modified Ap), we obtain 

E[AF PD (t)} -[B-pp- E[s F ]} < e - eE[AV PD ), 
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where is the rate at which OPT creates bins, and sp is the expected job size under F. Finally, 

E[F PD (n)] - n-W F <en + eE[V PD {0)]. 
By selecting e = 0(-^=) and adding the waste due to the partially full bins, we obtain an overall additive 



waste of Vw^n. O 



Proof of Theorem |U 

The proof of Theorem jlj can easily be modified to show that when e(t) = we still get 0(^/n) waste, 
which ensures performance guarantees in the case where there is no knowledge of time horizon n. From Q 
we get that the waste originating from the partially full bins can be bounded by B 4 /2e(n) = B 2 
Similarly as before, at any time t, we have: 

AF PD (t) + e(t) (V PD {t) - V PD {t - 1)) < AF Ap (t) + e(t) (V Ap {t) - V PD {t - 1)) . 

By taking the telescoping sum of the previous inequalities for < t < n, we obtain: 

n 

Fp D (n) -^AF AF (t) < ^2e(t)\V AF (t) ~ V Po(t - 1)] 
t t=i 

+ V PD(t)(e(t + 1) - e(t)) + e(l)V PD (0) - e(n)V PD (n), 
t=i 

which, after taking expectations, noting that e(t) is decreasing, and replacing e(t) = B 2 / yields 

n 

nF PD (n)} - n -W F <Y, <t) + e(l)V PD (0) - e(n)V PD (n) 

t=i 

e(t)dt + e(l)Vp D (0) 
< B 2 ^ + V PD (0). 

Thus, by adding the waste originating from the partially full bins, we obtain that the total additive waste is 
bounded by \/4i? 4 n. 



A.2 Optimality Analysis of the Primal-Dual Algorithm with Exponential Potential Function 
Proof of Theorem f5] 

In this case, the Lagrangian function takes the following form: 



B-l 



B-l 



£ex P (t) = E( 5 " h)N h {t) + - e 
h=l 61 h=l 



-e 2 N h (t) 



F A (t) V A (t) 

implying that its change per a single item placement can be bounded as 

B-l B-l 



AC exp (t) <Y;( B - h)sgn(AN h ) - J>" 



h=l 



h=l 



%gn(AN h ) - 62 



(11) 
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Again our goal is to find a (possibly distribution-aware) policy A for which the change in primal as well 
as the potential term is small. The first means that the policy should open a new bin with a probability not 
larger than the long run rate, irrespective of the state. Small increase in the potential term means that we 
prefer creating bins of a scarce level rather than destroying them. 



We consider a different modification to policy Ap described in Subsection A.l If the bin Y of optimal 
packing P* is perfectly packed, then we follow the Ap as described originally by Csirik et al. If the bin Y 
is non-full, then we modify Ap as follows: 

Given a non-full bin Y in the optimal packing P* and the current arbitrary packing P, we 
first find an ordering y%,y2, ■ ■ ■ , U\y\ °f tne items in Y, and a threshold index lastiY), < 
lastiY) < \Y\, such that if we set S, = Y2j=i s {Vj)> then: 

• P has partially filled bins with each level Si, < % < last{Y) 

• P has no partially filled bins with level Si ast ( Y ) + s(yi) for any i > last(Y) 

That is, we allow last(Y) to be equal to \ Y\. Remaining definition is unchanged. 



The change in the Lagrangian (111 after a placement of an item with the ordering index j can be upper 
bounded depending on the case: 

• If j > lastiY) : an item is sent to a partially full bin at level Si ast (y) (or if lastiY) = 0). This 
seems like a bad case because we destroy bins of some level with a higher probability than we create 
it, and that increases our potential function (unlike for SS potential). Also, if lastiY) = 0, then 
we open new bins with probability higher than OPT. However, the crucial observation is that in this 
process we create a bin of some level which does not exist yet. This causes sufficiently large drop in 
the potential to completely annihilate the increase in primal term or the potential term for Si ast ry)- 
We differentiate between two subcases: 

- last(Y) = 0: 

A£ exp <-Kyi) + s + ^( e - e2 -i) = -^ i ) + s-|(i-| + |-...y 

implying that for 62 < 1 

A£ exp < s( Vj ) + B-^(l-^) = -s( yj ) + -B — — + — • 
which in conjunction with €2 = Be\ yields 

A£ eX p < s(yj) + —62- 

- last(Y) > : we send an item to a bin with level h = Si ast ry) > with > 0, implying the 
change in the potential function, i.e. 

A£ exp < s{ Vj ) - ^ [(1 - e" e2 ) - (e~ n ^ - e^" 1 ^)] < - s ( yj ) 

• 1 < j < last(Y) : one of these items causes a new bin to open and increase the objective function 
term by B — s(l), while the others change the objective function by —s(yj). We now focus on the 
change in the potential function due to bins of level Si, ... , Si ast ( Y )-i- We do not need to look at 
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Slast(Y) since j = lastiY) only causes this number to increase and hence potential to fall, and we 
have already accounted for the effect of j > last(Y) on the Lagrangian. Focusing on this set of 
items, the probability of the increase of Ns t is at least as much as the probability of decrease of Ns t . 
Therefore, the expected change in the potential function due to level Si is at most (let iV^ = n before 
arrival of the tagged item): 



2ei 



e -(n+l)ea _ e -nt 2 



-ne 2 



2ei 



[e" e2 + 



+ e 



-ne 2 



ei 



-(n— l)e2 _ e ~ ne 2 



— + — + — + .. 

2! 4! 6! 



implying 



and 



AVi < — for < e 2 < 1, 
ei 

AVi < Be 2 for — = B. 

ei 

Now we look at the telescoping sum of changes in the Lagrangian: 

AF PD (t) + AV PD (t) < AF Af + AV Af , 
which after taking expectations yields 

E[AF PD (t)} < E[AF AF (t)} +E[AV Af ] - E[AV PD ] 



(12) 



or 



E[AF PD (t)} <[B-p F - E[s F ]] + Be 2 - E[AV PD ), 



where, similarly as before, pf is the rate at which OPT creates bins and sf is the expected job size under 
F. After summing across time horizon 1 < t < n, we obtain 



and 



E[F PD (n)} < n ■ W F + Bne 2 + E[V PD (0)] 
E[F PD (n)} <n-W F + Bne 2 + 



(13) 



Assuming that e 2 = Be\ and e\ = J^.we derive 



E[F PD (n)} < n ■ W F + 2^Bhi, 
where Wf is optimal value of the waste LP. 



Proof of Theorem |4] 

The proof carries through similarly as in ( [T2] ) - ( [T3] ) with minimal changes in the last step. Changes in the 
potential term of the PD algorithm do not telescope any more. In particular, the potential function keeps 
decreasing since e(t) decreases. Note that in this case, we fix e 2 (t) = e(t) and ei(i) = e(t)/B. 
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The sum of potential differences AVpo(t) equals to: 

B-l n 

^^<(/) 

=1 t=l v ; 

e -n h (n)e(n) 



&V PD (t) = B E y-v(e-^M - e -<*K(*-D) 



b-i 



/i=i 



-e(0)n h (Q) 



e(n) 



6(0) 



n-l 



1 



-e[t)n h (t) 



1 



-e(t+l)n fe (t) 



e(t + l) 



,=1 W*) 

Since the second summand in the previous expression is minimized when n^t) = 0, we obtain 

B-l 



^AVp D (t)>Bj2 



which, after selecting e{t) 



h=l 

-M=, implies 

rt+a 



e -e(0)n h (0) 

^(oT 



n-l 



1 



1 



i\<t) e(t+l) 



> 



B(B-l) 



n-l 



t=i 



> -B{B - 1) 



n 



Next, ( 13) and e 2 (i) = e(t)/B imply 



^ E[AF4 F + AV Af -W f ]<B^2 <t) = B Y1 



< Bk 
= Bk 



+ 



1 

a + 1 
1 

Vo + 1 



x=a+l 



+ 2\/n + a - 2Va + 1 < 2Bky/n + a. 



Therefore, the previous expression and ( 14 1 give an upper bound for the overall additive waste 

n + a 



2Bk^n^+~a + B{B - 1)- 



fc 



(14) 



which, by setting k = y B/2 and a = B, guarantees ^(t) < 1 and 

E[F PD (n)} < n ■ W F + V / 8# 3 (n + 5). 

A.3 Proof of Theorem g] 

In the chosen example, the steady state number of items is distributed as Poisson random variable with mean 
A, and the initial configuration to PD-quad places all these in bins with a hole of size 1 at the bottom. We 
now show that if we fix a random bin which is initially in this configuration, then it takes fy is|a) ^ me 
in expectation for this bin to empty, and, thus, for the PD-quad algorithm to recover the forbidden hole of 
size 1. 

As before, let Nh(t) denote the number of level h bins at time t. We first make the following claim on the 
supremum of Nh(t). The claim follows from known results on the supremum of mean-reverting stochastic 
processes (Ornstein-Uhlenbeck process), and we omit it for clarity. 
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Claim 1 Over a time interval of length poly(A), the supremum of N hit) (1 < h < 4) is bounded above by 
cty/\ log A for some constant a > with probability 1 — 0(e~ A ). 

Next, we lower bound the mean time it takes a level 5 bin (with four items of size 1) to reach the state with 
only two items, conditioned on the event of the claim. Let T 4 _^3 denote the mean time until this bin reaches 
level 4 (with three items), and let denote the mean time until this bin reaches level 3 for the first time 
after reaching level 4. Since the arrivals are Poisson and residence times i.i.d. and exponential, the system 
evolves according to a Markov chain. 
It is easy to see that 

When the bin reaches level 4, it transitions to a level 3 bin (with rate 3) in case of a departure, or to a level 5 
bin in case of an external arrival. The crucial observations are 

• iV 4 (i) is upper bounded by a^/Alog A; 

• Since level 4 bins are created at rate A due to departures from level 5 bins, they also get destroyed at 
rate A due to external arrivals wich increase their level to 5. 

Since an arrival picks a level 4 bin at random (conditioned on the arrival choosing a level 4 bin), our tagged 
bin gets picked with probability at least av / A \ og A ■ Therefore, the rate at which this bin sees an arrival to 

become a level 5 bin is r 4 ^5 > ^^J j^j- We now write the recurrence for T^2- 

T^2 = h q , 4 ~ >5 — • (?4->3 + T 3 ^ 2 ) 

3 + r 4 ^ 5 3 + r 4 ^ 5 



or, 



>i + VA/« 2 logA^ 



1 1 A 

+ 



3 12 y a 2 log A 

Therefore, for any bin of the initial configuration, it takes 17 I3§a) ^ me to em Pty out, and hence the 
waste under the PD-quad algorithm remains linear for at least ft i^§a) ^ me - 
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